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Abstract 

We attempt to unveil the fine structure of volatility feedback effects in the context of 
general quadratic autoregressive (QARCH) models, which assume that today's volatility can 
be expressed as a general quadratic form of the past daily returns. The standard ARCH 
or GARCH framework is recovered when the quadratic kernel is diagonal. The calibration 
of these models on US stock returns reveals several unexpected features. The off-diagonal 
(non ARCH) coefficients of the quadratic kernel are found to be highly significant both In- 
Sample and Out-of-Sample, but all these coefficients turn out to be one order of magnitude 
smaller than the diagonal elements. This confirms that daily returns play a special role in the 
volatility feedback mechanism, as postulated by ARCH models. The feedback kernel exhibits 
a surprisingly complex structure, incompatible with models proposed so far in the literature. 
Its spectral properties suggest the existence of volatility-neutral patterns of past returns. The 
diagonal part of the quadratic kernel is found to decay as a power-law of the lag, in line 
with the long-memory of volatility. Finally, QARCH models suggest some violations of Time 
Reversal Symmetry in financial time series, which are indeed observed empirically, although of 
much smaller amplitude than predicted. We speculate that a faithful volatility model should 
include both ARCH feedback effects and a stochastic component. 

> ' 

iO ■ 1 Introduction 

One of the most striking universal stylized facts of financial returns is the volatility clustering effect, 
\Q \ which was first reported by Mandelbrot as early as 1963 [31] . He noted that . . . large changes tend 

to be followed by large changes, of either sign, and small changes tend to be followed by small 
changes. The first quantitative description of this effect was the ARCH model proposed by Engle 
in 1982 18 . It formalizes Mandelbrot's hunch in the simplest possible way, by postulating that 
returns r* are conditionally Gaussian random variables, with a time dependent volatility (rms) at 
that evolves according to: 

a 2 = s 2 + gr 2 _ v (1) 

In words, this equation means that the (squared) volatility today is equal to a baseline level s 2 , plus 
a self-exciting term that describes the influence of yesterday's perceived volatility r 2 _ x on today's 
activity, through a feedback parameter g. Note that this ARCH model was primarily thought of 
as an econometric model that needs to be calibrated on data, while a more ambitious goal would 
be to derive such a model from a more fundamental theory — for example, based on behavioural 
reactions to perceived risk. 

It soon became apparent that the above model is far too simple to account for empirical data. 
For one thing, it is unable to account for the long memory nature of volatility fluctuations. It is 
also arbitrary in at least two ways: 

• First, there is no reason to limit the feedback effect to the previous day only. The Generalized 
ARCH model (GARCH) [S], which has become a classic in quantitative finance, replaces 
r t-i by an exponential moving average of past squared returns. Obviously, one can further 
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replace the exponential moving average by any positive weighting kernel fc(r), leading to a 
large family of models such that: 

oo 

a* = s* + J2KT)rl T , (2) 

T=l 

which includes all ARCH and GARCH models. For example, ARCH(g) corresponds to a 
kernel fc(r) that is strictly zero beyond r = q. A slowly (power-law) decaying kernel fc(r) is 
indeed able to account for the long memory of volatility — this corresponds to the so-called 
FIGARCH model (for Fractionally Integrated GARCH) [9]. 

• Second, there is no a priori reason to single out the day as the only time scale to define the 
returns. In principle, returns over different time scales could also feedback on the volatility 
today [33J EH 130], leading to another natural extension of the GARCH model as: 

oo 

e r=l 

where Rl is the cumulative, t day return between t—£ and t. The first model in that category 
is the HARCH model of the Olsen group [33], where the first "H" stands for Heterogeneous. 
The authors had in mind that different traders are sensitive to and react to returns on 
different time scales. Although this behavioural interpretation was clearly expressed, there 
has been no real attempt to formalize such an intuition beyond the hand-waving arguments 
given in |10) . 

The common point to the zoo of generalizations of the initial ARCH model is that the current 
level of volatility erf is expressed as a quadratic form of past realized returns. The most general 
model of this kind, called QARCH (for Quadratic ARCH), is due to Sentana [39], and reads: 

oo oo 

af = s 2 +J2L(r)r t . T + £ K(t, t') r t _ r »w , (4) 

r— 1 r,r' — 1 

where L(t) and K(t, t') are some kernels that should satisfy technical conditions for erf to be always 
positive (see below and [32] )• The QARCH can be seen as a general model for the dependence 
of erf on all past returns {r t '} t , <t , truncated to second order. The linear contribution, which 
involves L(t), captures a possible dependence of the volatility on the sign of the past returns. For 
example, negative past returns tend to induce larger volatility in the future — this is the well- 
known leverage effect [5J [TH 3] , see also [37] and references therein]]] The quadratic contribution, 
on the other hand, contains through the matrix K (t, t') all ARCH models studied in the literature. 
For example, ARCH(c/), GARCH and FIGARCH models all correspond to a purely diagonal kernel, 

K(T,T')=k(T)S T y. 

In view of the importance of ARCH modelling in finance, it is somewhat surprising that the 
general framework provided by QARCH has not been fully explored. Only versions with very 
short memories, corresponding to at most 2x2 matrices for K, seem to have been considered 
in the literature. In fact, Sentana's contribution is usually considered to be the introduction of 
the linear contribution in the GARCH framework, rather than unveiling the versatility of the 
quadratic structure of the model. The aim of the present paper is to explore in detail the QARCH 
framework, both from a theoretical and empirical point of view. Of particular interest is the 
empirical determination of the structure of the feedback kernel K(t, t') for the daily returns of 
stocks, which we compare with several proposals in the literature, including the multiscale model 
of [10] and the trend- induced volatility model of [48]. Quite surprisingly, we find that while the 
off-diagonal elements of K(t,t') are significant, they are at least an order of magnitude smaller 
than the diagonal elements fc(r) := K(t,t). The latter are found to decay very slowly with r, in 

^GJQARCH and alternative names such as Asymmetric (G)ARCH, Nonlinear (G)ARCH, Augmented ARCH, 
etc. often refer to this additional leverage (asymmetry) contribution, whereas the important innovation of QARCH 
is in fact the possibility of off-diagonal terms in the kernel K. 
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agreement with previous discussions. Therefore, in a first approximation, the dominant feedback 
effect comes from the amplitude of daily returns only, with minor corrections coming from returns 
computed on large time spans, at variance with the assumption of the model put forward in [10] . 
We believe that this finding is unexpected and far from trivial. It is a strong constraint on any 
attempt to justify the ARCH feedback mechanism from a more fundamental point of view. 

In parallel with ARCH modelling, stochastic volatility models represent another strand of 
the literature that has vigorously grown in the last twenty years. Here again, a whole slew of 
models has emerged [23] , with the Heston model [26] and the S ABR model [23] as the best known 
examples. These models assume that the volatility itself is a random process, governed either 
by a stochastic differential equation (in time) or an explicit cascade construction in the case of 
more recent multifractal models [341 [Pol [29] (again initiated by Mandelbrot as early as 1974! [32]). 
There is however a fundamental difference between most of these stochastic volatility models and 
the ARCH framework: while the former are time reversal invariant (TRI), the latter is explicitly 
backward looking. This, as we shall discuss below, implies that certain correlation functions are 
not TRI within QARCH models, but are TRI within stochastic volatility models. This leads to an 
empirically testable prediction; we report below that TRI is indeed violated in stock markets, as 
also documented in |47) . 

The outline of this paper is as follows. We first review in Section 2 some general analytical 
properties of QARCH models, in particular about the existence of low moments of the volatility. 
We then introduce in Section 3 several different sub-families of QARCH, that we try to motivate 
intuitively. The consideration of these sub-families follows from the necessity of reducing the 
dimensionality of the problem, but also from the hope of finding simple regularities that would 
suggest a plausible interpretation (behavioural or else) of the model, beyond merely best fit criteria. 
In Section 4, we attempt to calibrate "large" QARCH models on individual stock returns, first 
without trying to impose any a priori structure on the kernel K(t, t'), and then specializing to the 
various sub-families mentioned above. The same analysis is done in Section 5 for the returns of 
the stock index. We isolate in Section 6 the discussion on the issue of TRI for stock returns, both 
from a theoretical/modeling and an empirical point of view. We give our conclusions in Section 7, 
and relegate to an appendix more technical issues. 



2 General properties of QARCH models 

Some general properties of QARCH models are discussed in Sentana's seminal paper [33]. We 
review them here and derive some new results. The QARCH model for the return at time t, r t , is 
such that: 

lapt - lnp t _i =r t = <Tt€t, (5) 

where pt is the price at time t, at is given by the QARCH specification, Eq. ^ above, while the 
£'s are IID random variables, of zero mean and variance equal to unity. While many papers take 
these £'s to be Gaussian, it is preferable to be agnostic about their univariate distribution. In fact, 
several studies including our own (see below), suggest that the £'s themselves have fat-tails: asset 
returns are not conditionally Gaussian and "true jumps" do occurU 

In this section, we will focus on the following non-linear correlation functions (other correlations 
will be considered below, when we turn to empirical studies): 

C< 2 >(r) = <(r? - (r 2 )) r?_ T >t (6a) 

C W( T ) = ((o-*-(* 2 ))rl T ) t (6b) 

V(r',T") = (((r 2 -(r 2 ))r t - T ,r^ T „) t (6c) 

V(T',T") = (((a*-(a 2 ))r t - T ,r t - T „) t . (6d) 

Here and below, we assume stationarity and correspondingly (. . .) t refers to a sliding average over 
t. The following properties are worth noticing: by definition, T>(t,t) = C^ 2 \t) and T>(t,t) = 
C^ 2 '{t). Furthermore, whereas C^ 2 \t) = C^(— r) by construction, the same is not true in general 



2 There seems to be a slowly growing consensus on this point (see e.g. pQ): Gaussian processes with stochastic 
volatility cannot alone account for the discontinuities observed in market prices. 
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for C^(t). However, using the QARCH causal construction and the independence of the £'s, 
one can easily convince oneself that when t > 0, C^{t) = C (2) (Y)- Similarly, for r' > r" > 0, 
V(t', t") = V(t', r"), while in general, X>(t', r") ^ V(-t',-t") = 0. 

2.1 Second moment of the volatility and stationarity 

QARCH models only make sense if the volatility does not diverge to infinity. The criterion for 
stability is easy to establish if the £'s are IID and of zero mean, and reads: 

TtK<1. (7) 

In this case, the volatility is a stationary process such that (a 2 ) = s 2 /(l — Tr K): the feedback- 
induced increase of the volatility only involves the diagonal elements of K. Note also that the 
leverage kernel L(t) does not appear in this equation. As an interesting example, we consider 
kernels with a power-law decaying diagonal: K(t,t) = g T a ^{ T <q}- For a given a, g must be 
smaller than a certain g c (a,q) for (a 2 ) to be finite. Fig. [1] shows the critical frontier g c (a,q) for 
q = 1 and q — » oo. The critical frontier in the limit case q = oo is given by g c = l/£(a), where Ci a ) 
is Riemann's zeta function (dashed red) . Note in particular that the model is always unstable when 
a < 1, i.e. when the memory of past realized volatility decays too slowly. At the other extreme, 
q= 1, the constraint is well known to be g = k(l) < 1 (dot-dashed red). 

Note that within a strict interpretation of the QARCH model, there are additional constraints 
on the kernels K and L that arise from the fact that a 2 should be positive for any realization of 
price returns. This imposes that a) all the eigenvalues of K should be non-negative, and b) that 
the following inequality holds: 

^L(r)^- 1 (r,r')L(r')<4 S 2 , (8) 

t,t' 

where K~ 1 is the matrix inverse of K. However, these constraints might be too strong if one 
interprets the QARCH model as a generic expansion of of in powers of past returns, truncated to 
second order [39] . It could well be that higher order terms are stabilizing and lead to a meaningful, 
stable model beyond the limits quoted here. 



2.2 Fourth moment of the volatility 

The existence of higher moments of a can also be analyzed, leading to more and more cumbersome 
algebra. In view of its importance, we have studied in detail the conditions for the existence of 
the fourth moment of er, which allows one to characterize the excess kurtosis n of the returns, 
traditionally defined as: 



(r 2 



- 3 = 



(9) 



In the general case, (c 4 ), C^ 2 \t) and T>(t',t") are related by the following set of self-consistent 
equations: 



2\2 



=(a 2 ) 2 ( Tr(^ 2 ) - Tr(K * 2 )) + J] K(t, r)C (2) (r) 

T>0 

£>(ti,t 2 )- K(t,t)V(ti-t,t 2 -t) 

Q<T<T 2 

C< 2 )(r > 0) =tf(T,T)(V><a - (a 2 ) 2 ) + K{t\t')C {2) (t-t') 



(10a) 



+ 2 J2 K{ J^) 

0<T 2 <Ti 



(10b) 



+ 2 K(t',t")T>(t' -t,t" -t) 

t'>t">t>0 

V(T 1 >0,T 2 >0)=2K(T 1 ,T 2 )(c^(T 1 -T 2 ) + (a 2 ) 2 ) + K( - T ' >'Mn-T' \t 2 -t') (10c) 



0<t'<t 2 

2 K(t',T 2 )V(t' -T2,\n-T 2 \). 
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Figure 1: Critical frontier in the parameter space for K(t,t) = g T~ a t{ T<q y, according to the 
finiteness of (cr 2 ) (red) and (cr 4 ) (blue). Above the red line, both (cr 2 ) and (cr 4 ) diverge. In the 
wedge between the blue and red lines, (cr 2 ) < oo while (cr 4 ) diverges. 

where we assume for simplicity here that the leverage effect is absent, i.e. L(t) = 0, and K' 2 means 
the square of K in the Hadamard sense (i.e. element by element). For a QARCH with maximum 
horizon q, we have thus a set of 1 + q + q(q — l)/2 linear equations that can be numerically solved 
for an arbitrary choice of the kernel K. These equations simplify somewhat in the case of a purely 
diagonal kernel K(t,t') = k(T)5 T ^ T > . One finds: 

(^)=(a 2 ) 2 + J2k(rp 2 Hr) (11a) 

T>0 

C( 2 )(r)=fc(r)((a 4 >(e 4 )-(a 2 ) 2 )+ £ k(r')C (2 \r - r') (lib) 

t'^t>0 

By substituting (cr 4 ), it is easy to explicit the linear system in matrix form VC^ 2 -* = S with 

V(r, t') = 8 TT , - (e 4 )fc(r)fc(r') - [k(r - r') + fc(r + r')] 
S(r)=k(r)(a 2 ) 2 ((e)-l) 

with the convention that k(r) = 0,Vr < 0. 

Let us examine this in more detail for ARCH(g). For simplicity, we assume here that £ is 
Gaussian ((£ 4 ) = 3) and s is chosen such that (a 2 ) = 1. The condition for which (cr 4 ) first diverges 
is given by det V = 0. For different q's, this reads: 

• for q = 1, one recovers the well known result that ARCH(l) has a finite fourth moment only 
when fci < 1/V3- 

• for q = 2, the stability line is given by k± + ki = 1, while the existence of a finite fourth 
moment is given by the condition k\ < (1/3 — fcf)(l — &2)/(l + fe)- 

• for q — > oo, we again assume the r dependence of k(r) to be a power-law, gr~ a (corre- 
sponding to the FIG ARCH model). The critical line for which the fourth moment diverges 
is shown in blue in Fig. [T] After a careful extrapolation to q — oo, we find that whenever 
1 < a < a c ~ 1.376, the fourth moment exists as soon as the model is stationary, i.e. when 
g < l/((a) < 1/C(a c ) « 0.306. 
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Figure 2: Matrix structures, (a) Overlapping two-scales; (b) Borland-Bouchaud multiscale; (c) 
Zumbach; (d) Long- Trend. 



The last result is quite interesting and can be understood from Eq. (fTTj) . which shows that to 
lowest order in <?, one has: 

/~4\ 1 

(12) 



r 2\2 



i««e 4 }-D.9 2 Ei- 



r>0 



The above expression only diverges if a < 1/2, but this is far in the forbidden region a < 1 where 
(a 2 ) itself diverges. Therefore, perhaps unexpectedly, a FIGARCH model with a long memory (i.e. 
a < 1.376) cannot lead to a large kurtosis of the returns, unless the £ variables have themselves fat 
tails. We will come back to this important point below. 



3 Some special families of QARCH models 

As we alluded to in the introduction, ARCH(q) models posit that today's volatility is only sensitive 
to past daily returns. This assumption can be relaxed in several natural ways, each of which leading 
to a specific structure of the feedback kernel K. We will present these extensions in increasing 
order of complexity. 

3.1 Returns over different time scales 

Let us define the £-day return between t — £ and t as R t , such that: 

i 

T = l 



G 



where pt is the price at time t. The simplest extension of ARCH(g) is to allow all past 2-day 
returns to play a role as well, i.e.: 

a* = , 2 + £ 9i(r)[R^ T } 2 + £ 92(T)[R?lf, (14) 

where g\ (r) and (72(1") are coefficients, all together 2q — 1 of them. Upon identification with the 
QARCH kernel, one finds: 

K(t, t) = ffi(r - 1) + 52 (r - 1) + <? 2 (r - 2), 
A-(r,r+l) = sa(r-l) (15) 
if (t, r + £) = for £ > 2, 

with the convention that 52 (~1) = 0. The model can thus be re-interpreted in the following way: 
the square volatility is still a weighted sum of past daily squared returns, but there is an extra 
contribution that picks up the realized one-day covariance of returns. If gi (r) > 0, the model 
means that the persistence of the same trend two days in a row leads to increased volatilities. A 
schematic representation of this model is given in Fig. [21(a). 

One can of course generalize again the above model to include 2-day, 3-day, £-day returns, with 
more coefficients g\ (r), 32 (t), . . . , ge(r), with a total of £(2q + 1 — £)/2 parameters. Obviously, when 
£ = q, all possible time scales are exhausted, and the number of free parameters is q(q + l)/2, i.e. 
exactly the number of independent entries of the symmetric q x q feedback kernel K . 

3.2 Multiscale, cumulative returns 

Another model, proposed in [331 110] , is motivated by the idea that traders may be influenced not 
only by yesterday's return, but also by the change of price between today and 5-days ago (for 
weekly traders), or today and 20-days ago (for monthly traders), etc. In this case, the natural 
extension of the ARCH framework is to write: 

a?= S z+jr 9BB (e)lR^f, (16) 

1=1 

where the index BB refers to the model put forward in |10j . The BB model requires a priori q 
different parameters. It is simple to see that in this case, the kernel matrix can be expressed as: 

Jf flfl (TV")=G[max(r',T")], with G[t] = ^ g BB (£). (17) 

The spectral properties of these matrices are investigated in detail in Appendix [A] One can of 
course consider a mixed model where both cumulative returns and daily returns play a role. This 
amounts to taking the off-diagonal elements of K as prescribed by the above equation, but to 
specify the diagonal elements K(t,t) completely independently from G[t\. This leads to a matrix 
structure schematically represented in Fig. [U(b), parameterized by 2q— 1 independent coefficients. 



3.3 Zumbach's trend effect (ARTCH) 

Zumbach's model 48 is another particular case in the class of models described by Eq. (jlj). 
It involves returns over different lengths of time and characterizes the effect of past trending 
aggregated returns on future volatility. The quadratic part in the volatility prediction model is: 

L9/2J 

ARCH + 9z(e)R { t e) R[% (18) 

where £ is again capturing specific relevant time scales like the day (£ — 1), the week [l = 5), the 
month (£ = 20) etc. The off-diagonal elements of the kernel K now take the following form: 

min(T"-l,L«/2j) 

K(t',t">t>)= 9z{£) (19) 

£— max(r' , ~ft ) 
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Since it is upper triangular by construction, we symmetrize it as \{K + K*), and the diagonal is 
filled with the ARCH parameters. This model contains q + [q/2\ independent coefficients, and is 
schematically represented in Fig. [2Jc) . 

3.4 A generalized trend effect 

In Zumbach's model, the trend component is defined by comparing returns computed over the 
same horizon I, This of course is not necessary. As an extreme alternative model, we consider a 
model where the volatility today is affected by the last return r t -\ confirmation (or the negation) 
of a long trend. In more formal terms, this writes: 

9-1 

ARCH + rt-i x J2 gMQn-i-t, (20) 
1=1 

where S'lt(^) is the sequence of weights that defines the past "long trend" (hence the index LT). 
This now corresponds to a kernel K with diagonal elements corresponding to the ARCH effects and 
a single non trivial row (and column) corresponding to the trend effect: K{\, r > 1) = <?lt(t — !)• 
This model has again 2q — 1 independent parameters. 

Of course, one can consider QARCH models that encode some, or all of the above mechanisms 
— for example, a model that schematically reads ARCH + BB+LT would require 3q— 2 parameters. 

3.5 Spectral interpretation of the QARCH 

Another illuminating way to interpret QARCH models is to work in the diagonal basis of the K 
matrix, where the quadratic term in (|4|) reads: 

it (j2 X nVn(T')v n (T") j rw r^ T „ = £ A„ (r\v n ) 2 t (21) 

t',t"=1 \ n / n 

with (A„, v n ) being, respectively, the n-th eigenvalue and eigenvector of K, and (r\v n )t = Y1t=i v n( T ) ft—r 
the projection of the pattern created by the last q returns on the n-th eigenvector. One can there- 
fore say that the square volatility of picks up contributions from various past returns modes. The 
modes associated to the largest eigenvalues A are those which have the largest contribution to 
volatility spikes. 

The ARCH(q) model corresponds to a diagonal matrix; in this case the modes are trivially indi- 
vidual daily returns. Another trivial case is when K is of rank one and its spectral decomposition 
is simply 

K{t',t") =Xv(t')v(t") (22) 

where A = Tr(i^) is the only non-null eigenvalue, and v(t) = y/K{r, r)/ Tr(K) the eigenvector 
associated with this non-degenerate mode. The corresponding contribution to the increase in 
volatility ([2~Tj) is therefore 

XRl Rt = (r\v) t J2v(T)r t ^ T , (23) 

where Rt can be interpreted as an average return over the whole period, with a certain set of 
weights v(t). 

The pure BB model (without extra ARCH contributions) can also be fully diagonalized in the 
large q limit for certain choices of the function 3_bb(t). We detail these calculations (which are 
mostly of theoretical interest) in Appendix IA"1 

4 Empirical study: single stocks 

We now turn to the empirical calibration of "large" QARCH models, i.e. models that take into 
account q past returns with q large (20 or more). The difficulty of course is that the full calibration 
of the matrix K requires the determination of q(q + l)/2 parameters, which is already equal to 210 
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when q — 20! This is why the discussion of the previous section is important: imposing some a 
priori structure on the matrix K may help limiting the number of parameters, and gaining robust- 
ness and transparency. However, perhaps surprisingly, we will find that none of the above model 
seem to have enough flexibility to reproduce the subtle structure of the empirically determined K 
matrix. 



4.1 Dataset and methodology 

Equation (HJ) is a prediction model for the predicted variable a t with explanatory variables past 
returns r at all lags. The dataset we will use to calibrate the model is composed of daily stock 
prices (Open, Close, High, Low) for N — 280 names present in the S&P-500 index from Jan. 2000 
to Dec. 2009 (T = 2515 days), without interruption. The reference price for day t is defined to be 
the close of that day C t , and the return r t is given by r t = \nC t — lnC t _i. The true volatility is 
of course unobservable; we replace it by the standard Rogers-Satchell (RS) estimator [551 12"2"] : 

5? = \n(H t /O t )\n(H t /C t ) + \n(L t /O t ) \n(L t /C t ). (24) 

As always in this kind of studies over extended periods of time, our dataset suffers from a selection 
bias since we have retained only those stock names that have remained alive over the whole period. 
There are several further methodological points that we need to discuss right away: 



Universality hypothesis. We assume that the feedback matrix K and the leverage kernel L are 
identical for all stocks, once returns are standardized to get rid of the idiosyncratic average 
level of the volatility. This will allow us to use the whole data set (of size N x T) to calibrate 
the model. Some dependence of K and L on global properties of firms (such as market cap, 
liquidity, etc.) may be possible, and we leave this for a later study. However, we will see 
later that the universality hypothesis appears to be a reasonable first approximation. 

Removal of the market-wide volatility. We anticipate that the volatility of a single stock has a 
market component that depends on the return of the index, and an idiosyncratic component 
that we attempt to account for with the returns of the stock itself. As a proxy for the 
instantaneous market volatility, we take the cross-sectional average of the squared returns of 
individual stocks, i.e. 



\ 



1 N 



and redefine returns and volatilities as r t /E t and CTt/S t . Finally, as announced above, the 
return time series are centered and standardized, and the RS volatility time series are stan- 
dardized such that (of t ) = 1 for all is. (This also gets rid of the multiplicative bias of the 
Rogers-Satchell estimator when used with non-Gaussian returns.) 

Calibration strategy. The standard procedure used to calibrate ARCH models is maximum- 
likelihood, which relies on the choice of a family of distributions for the noise component 
£, often taken to be Student-t distributions (which include, in a limit, the Gaussian distri- 
bution). However, this method cannot be used directly in our case because there are far 
too many parameters and the numerical optimization of the log-likelihood is both extremely 
demanding in computer time and unreliable, as many different optima are expected in gen- 
eral. A more direct method, in the spirit of the Generalized Least Squares, is to determine 
the 1 + q + q(q + l)/2 independent parameters of the model using empirically determined 
correlation functions that depend on s 2 , L(t) and K(t,t'). This latter method is however 
sensitive to tail events and can lead to biases. We will therefore use a hybrid strategy, where 
a first estimate of these parameters, obtained using the "correlation function" method, serves 
as the starting point of a one-step likelihood maximization, which determines the set of most 
likely parameters is the vicinity of the "raw" estimate (more details on this below). 

Choice of the horizon q. In principle, the value of the farthest relevant lag q is an additional 
free parameter of the model, and should be estimated jointly with all the others. However, 
this would lead to a huge computational effort and to questionable conclusions. In fact, we 
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will find that the diagonal elements K(t,t) decay quite slowly with r (in line with many 
previous studies) and can be accurately determined up to large lags using the correlation 
function method. Off-diagonal elements, on the other hand, turn out to be much smaller and 
rather noisy. We will therefore restrict the horizon for these off-diagonal elements to q = 10 
(two weeks) or q = 20 (four weeks). Longer horizons, although possibly still significant, lead 
to very small out-of-sample extra predictability (but note that longer horizons are needed for 
the diagonal elements of K). 

4.2 Raw estimation based on correlation functions 

On top of the already defined four-point correlation functions C^(t) and V(t',t") (and their 
corresponding "tilde" twins), we will introduce two- and three-point correlation functions that 
turn out to be useful (note that the r t s are assumed to have zero mean): 



C«(r) 


= (r t r t - T )t 


(26a) 


C (a) (r) 


E_((r 2 -(r 2 ))|r t _ T |) t 


(26b) 


C (a) (r) 


ee <(a 2 - (a 2 )) IrwDt 


(26c) 


£(r) 


EE((r 2 -(r 2 ))rw) t 


(26d) 


£(r) 


= <(a?-<a 2 ))r t _ T ) t 


(26e) 


£ (a) (r) 


= (\r t \r t -r)t 


(26f) 


V^(t',t") 


= <(N-<|r|»rwrw,>. 


(26g) 



The C^(t) correlation function is by definition equal to (r 2 ) 4 = 1 for r = 0, and is usually 
considered to be zero for r > 0. However, as is well known, there are small anti-correlations of 
stock returns. On our data set, we find that these linear correlations are very noisy but significant, 
and can be fitted by: 

C (1) (r > 1) « -0.04e-°' 39T , (27) 

corresponding to a decay time of w 2.5 days. The values of characterize volatility correlations 
and are similar in spirit to C^ 2 \ but they only involve third order moments of r, instead of fourth 
order moments, and are thus more robust to extreme events. The £ correlations, on the other hand, 
characterize the leverage effect, i.e. the influence of the sign of past returns on future volatilities. 

These correlation functions allow us to define a well-posed problem of solving a system with 
1 + q + Si3+ll unknowns (s 2 , L(t), K{t' , r")) using the following 1 + q + q + ^j^- independent 
equations that involve empirically measured correlation functions (in calligraphic letters), for 
1 < t < q and 1 < r 2 < t\ < q: 

(a 2 )=s 2 + K(t',t")C^(t'-t") 

r' ,r" 

C(t) =J2L(t')C( 1 \t-t') + J2K(t',t')£(t-t') + 2J2 K(t' ,t)£(t' -t) 

t' t' t'^t 

C {a) (r) « ^L{ T ')C {a) {T'-T) +^K{ T \T')C {a) {T -t') 

t' t' 

+ 2 K{t',t")V^\t'-t,t"-t) 

t'>t">t>0 

P(ti,t 2 ) w L(t 2 )jC(t 1 -t 2 ) +L{t 1 )C{t 2 -t 1 ) 

+ 2 Y, ^(t',t 2 )(2?(t 1 -t 2 ,t'-t 2 )+C( 1 )(t 1 -t')-C( 1 )(t'-t 2 )C( 1 )(t 1 

t'>t 2 

+ K(r',T')V(n-T',r 2 -r'), 

r'<min(ri ,T2) 

where all the sums only involve positive rs. These equations are exact if all 3-point and 4-point 
correlations that involve rs at 3 (resp. 4) distinct times are strictly zero. But since the linear 
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(28a) 
(28b) 
(28c) 

(28d) 

-T2)) 




Figure 3: Two empirical correlation functions: the leverage £(r) and the correlation of amplitudes 
C^ a \r), together with their fits. C(t) is fitted by the sum of two exponentials -oe" T| " — ce~ T / d , 
with a — 0.007, b = 327 days, c = 0.029, d = 17 days; whereas C^(t) is fitted by a power-law 
truncated by an exponential: gT 1 ~ a e~ T / r ° , with a = 1.14, g = 0.106, To = 290 days. 



correlations C^(t > 0) are very small, it is a safe approximation to neglect these higher order 
correlations. 

Note that the above equations still involve fourth order moments (the off-diagonal elements 
of 2?), that in turn lead to very noisy estimators of the off-diagonal of K. In order to improve 
the accuracy of these estimators, we have cut-off large events by transforming the returns r t into 
r cut tanh(r t /r cut ), which leaves small returns unchanged but caps large returns. We have chosen 
to truncate events beyond 3 — er, i.e. r cu t = 3. In any case, we will use the above equations in 
conjunction with maximum likelihood (for which no cut-off is used) to obtain more robust estimates 
of these off-diagonal elements. 

When solving the above set of equations, we find that the diagonal elements K(t, t) are an 
order of magnitude larger than the corresponding off-diagonal elements K(t,t' ^ r). This was 
not expected a priori and is in fact one of the central result of this study, and confirms that daily 
returns indeed play a special role in the volatility feedback mechanism, as postulated by ARCH 
models. Returns on different time scales, while significant, can be treated as a perturbation of the 
main ARCH effect. 

4.3 The diagonal kernels 

The above remark suggests to calibrate the model in two steps. We first neglect off-diagonal effects 
altogether, and determine the 2q + l parameters s 2 ,L(t) and fc(r) = K(t, t) through the following 
reduced set of equations: 

(a 2 ) = l = s 2 +J2 fc ( T ) ( 29a ) 

T 

C(t) = L(t) + L(t')C (1) (t-t') + k{ T ')C{T-T') (29b) 
C {a) {T) « L{T')C {a) {T'~T) + Y k{T')C {a) {T - t'). (29c) 

T 1 T 1 

The input empirical correlation functions £(r) and C^ a \r) are plotted in Fig. [31 together with, 
respectively, a double-exponential fit and a truncated power-law fit (see legend for parameters 
values). L and look very similar to £(t); note that all these functions are approximately 
zero for r < 0. The above equations are then solved using these analytical fits, which leads to the 
kernels fc(r) and L(t) that we report in bold in Fig. [H Using the raw data — instead of the fits 
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Figure 4: Calibration of the diagonal kernels for stocks, with q — 100. Left: L(r). Middle: 
K(t, t). Right: s 2 as a function of the farthest lag; the red line is a fit according to the formula ([50]) 
(see text for parameter values). 

— for all the correlation functions results in more noisy L(t) and fc(r) (thin lines), but still very 
close to the bold curves shown in Fig. [U As expected, L(t) is very close to C(t): there is a weak, 
but significant leverage effect for individual stocks. 

We then show in the same Fig. 2] a plot of s 2 (q) = 1 — 53t=i M t ) as a function of q. Including 
the feedback of the far away past progressively decreases the value of the baseline level s 2 . In order 
to extrapolate to q — > oo, we have found that the following fit is very accurate (see Fig. 2]): 

s 2 {q) = sl +g^-e-«'™, (30) 
a — 1 

with s 2 ^ « 0.21, a « 1-11,9 ~ 0.081 and qo « 53. Several comments are interesting at this stage: 

• The asymptotic value s 2 ^ is equal to 20% of the observed squared volatility, meaning that 
volatility feedback effects increase the volatility by a factor sa 2.25 (but remember that we 
have scaled out the market wide volatility). Such a strong amplification of the volatility of 
course resonates with Shillcr's "excess volatility puzzle" and gives a quantitative estimate of 
the role of self-reflexivity and feedback effects in financial markets [40], \T7\ [20l [43l [11] [21] . 

• The above fit corresponds to a power-law behavior, k(r) w gr~ a for r <C go, and an expo- 
nential decay for larger lags. Therefore, a characteristic time scale of qo ~ 3 months appears, 
beyond which volatility feedback effects decay more rapidly. 

• With a diagonal positive kernel K, the condition for positive definiteness © of the quadratic 
form reads L(t) 2 /k(r) < 4s 2 . The estimated values of L and k yield a left-hand side 
equal to 0.595, while the right-hand side amounts to 0.823. 

• Using the results of Sect. 12.21 one can compute the theoretical value of (a 4 ) that corresponds 
to the empirically determined fc(r). As expected from the fact that g is small and a close to 
unity, one finds that the fluctuations of volatility induced by the long-memory feedback are 
weak: (a 4 ) = 1.156 (see Eq. (|12p above). Including the contribution of the leverage kernel 
L(t) to (a 4 ) does not change much the final numerical value, that shifts from 1.156 to 1.161. 

• The above result shows that most of the kurtosis of the returns r t = <7t£,t must come from 
the noise £t, which cannot be taken as Gaussian. Using the diagonal ARCH model with the 
kernels determined as above to predict at , one can study the distribution of £j = r t /at and 
find the most likely Student-t distribution that accounts for it. We find that the optimal 
number of degrees of freedom is v w 6.4, and the resulting fit is shown in Fig. [5] Note that 
while the body and 'near-tails' of the distribution are well reproduced by the Student-t, the 
far-tails are still fatter than expected. This is in agreement with the commonly accepted tail 
index of v ta ii ~ 4, significantly smaller than 6.4. 
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Figure 5: Cumulative distribution function of the residuals £ t = r t /cr t . The magenta line is the 
Student distribution with best (maximum-likelihood) fitting tail parameter v — 6.4. Far tails 
suggest a fatter distribution with a smaller value of v ta a ~ 4 (cyan). 

Assuming £ t to be a Student-t random variable with v = 6.4 degrees of freedom, we have re- 
estimated k(r) and L(t) using maximum-likelihood (see below). The final results are more noisy, 
but close to the above ones after fitting. Our strategy is thus to fix both fc(r) and L(t), and only 
focus on the off-diagonal elements of K henceforth. 

4.4 The off-diagonal kernel, raw & maximum likelihood estimation 

We can now go back to Eq. (I28d|) that allows one to solve for K(t' , r" > r'), once fc(r) and L(r) 
are known. As announced above, we choose q — 20 for the time being. Because T> involves the 
fourth moment of the returns, this procedure is not very stable, even with a lot of data pooled 
together, and even after the truncation of large returns. Maximum likelihood estimates would be 
more adapted here, but the dimensionality of the problem prevents a brute force determination of 
the q(q — l)/2 parameters. In order to gain some robustness, we use the following strategy. The 
Student log-likelihood per point X is given byH 

I 1/ ( J L ) jr,{r t }) = ^^[ 2 ,loga?-(i/ + l)log(a?+r t 2 )], a 2 t = (u-2)al (31) 
t 

where of is given by the QARCH model expression, Eq. ((4]). We fix v — 6.4 and determine 
numerically the gradient dX v and the Hessian ddX v of X v as a function of all the q(q — l)/2 off- 
diagonal coefficients A"(t', t" > r'), computed either around the raw estimates of these parameters, 
or around the ARCH point where all these coefficients are zero. Note that dX v is a vector with 
q(q— l)/2 entries and ddX v is a q(q— l)/2 x q(q— l)/2 matrix. It so happens that the eigenvalues of 
the Hessian are all found to be negative, i.e. the starting point is in the vicinity of a local maximum. 
This allows one to find easily the values of the q(q — l)/2 parameters that maximize the value of 
X v \ they are (symbolically) given by: 

K* = K - (ddX^y 1 -dXZ, (32) 

where Ko is the chosen starting point — either the raw estimate Kq = K law based on Eq. (|28dp 
or simply Ko = if one starts from a diagonal ARCH model, and the overline on top of ddX, dX 
indicates averaging over stocks. This one step procedure is only approximate but can be iterated; 

3 In the following we do not truncate the large returns, but completely neglect the weak linear correlations C^(t) 
that are present for small lags, and that should in principle be taken into account in the likelihood estimator. 
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it however assumes that the maximum is close to the chosen initial point, and would not work 
if some eigenvalues of the Hessian become positive. In our case, both starting points (Ko = 
or Kq — ifraw) lead to nearly exactly the same solution; furthermore the Hessian recomputed at 
the solution point is very close to the initial Hessian, indicating that the likelihood is a locally 
quadratic function of the parameters, and the gradient evaluated at the solution point is very close 
to zero in all directions, confirming that a local maximum as been reached. 

Before exposing our results, we briefly go through a digression to discuss the bias and error 
on the estimated parameters K* as well as on the resulting maximal average likelihood T . The 
likelihood 1, its gradient dX and Hessian matrix ddX are generic functions of the set of parameters 
K to be estimated, and of the dataset, of size n. As the number n of observations tends to infinity, 
the covariance matrix of the ML estimator of the parameters is well known to be (n/) _1 , where / 
is the Fisher Information matrix 

i = E[-aaz] « -WE(k*) 

while the asymptotic bias scales as n _1 and is thus much smaller than the above error (~ n -1 / 2 ). 
As a consequence, ML estimates of K exceeding ±diag(— nddl ) -1 / 2 will be deemed significant. 
The bias on the average in-sample (IS) value of the maximum likelihood itself can be computed to 
first order in — , and is very generally found to be +M/2n, where M is the number of parameters to 
be determined. Similarly, the bias on the average out-of-sample value of the maximum likelihood 
is — M/2nQ Since each sampling of our data set will contain n — T ■ N/2 « 350, 000 observations, 
differences of likelihood smaller than M/2n ~ 3 ■ 10~ 4 are insignificant when M = 190 (corre- 
sponding to all off-diagonal elements when q = 20). This number is ~ 5 times smaller when one 
considers the restricted models introduced above (which contain ~ 40 parameters). 

The most likely off-diagonal coefficients of K* are found to be highly significant (see Tab.Q]): the 
IS increase of likelihood from the purely diagonal ARCH(q) model is AZ ~ 10~ 3 per point. This is 
confirmed by an out-of-sample (OS) experiment, where we determine K* on half the pool of stocks 
and use it to predict the volatility on the other half (whence the above factor 1/2 in the numerical 
estimation of n). The experiment is performed over iV samp = 150 independent pool samplings. 
The average OS likelihood is very significantly better for the full off-diagonal kernel K* than for 
the diagonal ARCH(g), itself being better than the raw estimate K raw based on Eq. (|28d|) . and 
probably subject to biases due to the truncation procedure. Note that the full off-diagonal kernel 
K* has many more parameters than the diagonal ARCH(g); it therefore starts with a handicap 
out-of-sample since the bias on the OS likelihood is, as recalled above, —M/2n. 

However, as announced above, the off-diagonal elements of K* are a factor ten smaller than 
the corresponding diagonal values. We give a heat-map representation of the matrix K* in Fig. [5] 
Various surprises are immediately apparent. 

First, while the off-diagonal elements are mostly positive for small r', r", clear negative streaks 
appear for intermediate and large rs. This is unexpected, since one would have naively guessed that 
any trend (i.e. positive realized correlations between returns) should increase future volatilities. 
Here we see that some quadratic combinations of past returns contribute negatively to the volatility. 
This will show up in the spectral properties of K* (see Section 14.51 below) . 

The second surprise is that there does not seem to be any obvious structure of the matrix, that 
would be reminiscent of one of the simple models represented in Fig. [2] This means that the fine 
structure of volatility feedback effects is much more subtle than anticipated. 

We have nevertheless implemented a restricted maximum-likelihood estimation that imposes 
the structure of one of the models considered in Section [3] We find that all these models are 
equally "bad" — although they lead to a significant increase of likelihood compared to the pure 
ARCH case, both IS and OS, they are all superseded by the unconstrained model shown in Fig. [6j 
again both IS and OS (see Tab. [T]). The best OS model is "Long-trend", with a kernel <?lt(^) 
shown in Fig. [7J together with the functions gz{£), Qb b {$) i 9z{£)- While <7ltCQ looks roughly 
like an exponential with memory time 10 days, the two-day return kernel g<i (£) reveals intriguing 
oscillations. Two-day returns influence the volatility quite differently from one day returns! On 

4 These corrections to the likelihoods lead to the (per point) Akaike Information Criterion [2] 
AIC = —2(1 — M/n), that trades off the log-likelihood and the number of parameters. AIC is used for model 
selection purposes mainly. When comparing parametric models with the same number of parameters, AIC is not 
more powerful than the likelihood. 
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Figure 6: Heatmap of the unconstrained model. Left: Raw estimation; Right: ML estimation 
(white spots correspond to values smaller than their corresponding error margin). Red spots 
correspond to values larger than 0.01. Note the negative streaks at large lags, and the significant 
off-diagonal entry for t = 1,t' = 2 days. 

the other hand, we do not find any convincing sign of the multiscale "BB" structure postulated in 

Note that the structure shown in Fig. [7] is found to be stable when q is changed. It would be 
interesting to subdivide the pool of stocks in different categories (for example, small caps/large 
caps) or in different sub-periods, and study how the off-diagonal structure oiK is affected. However, 
we note that the dispersion of likelihood over different sampling of the pool of stocks is only 50% 
larger than the "true" dispersion, due only to random samplings of a fixed QARCH model with 
parameters calibrated to the data (see caption of Tab. [IJ. This validates, at least as a first 
approximation, the assumption of homogeneity among all the stock series that have been averaged 
over. 

To conclude this empirical part, we have performed several ex-post checks to be sure that our 
assumptions and preliminar estimations are justified. First, we have revisited the most likely value 
of the Student parameter v (tail index of the distribution of the residuals £(i) = r(t) / aQARca(t)) 
with now the full matrix K*, plus diagonal terms up to q = 100, and found again v = 6.4. 
This shows that our procedure is consistent from that point of view. Second, we have computed 
the quadratic correlation of the residuals £t, which are assumed in the model to be IID random 
variables with, in particular, no variance autocorrelation: (£,f£,t-r) — 1 = for t^0. Empirically, 
we in fact a decaying negative correlation of weak magnitude, which decays exponentially with 
time. This additional dependence of the amplitude of the residuals, together with the excess fat 
tails of their probability distribution, is probably a manifestation of the truly exogenous events 
occuring in financial markets that have different statistical properties 27 and not captured by 
the endogeneous feedback mechanism. Finally, about the universality hypothesis, we discuss in 
the caption of Tab. [1] how the assumption of homogeneous stocks is validated by comparing the 
cross-sectional dispersion of the likelihoods obtained empirically and on simulated series. 

4.5 Spectral properties of the empirical kernel K 

As discussed in Section 15751 another way to decipher the structure of K is to look at its eigenvalues 
and eigenvectors. We show in Fig. [5] the eigenvalues of K* as a function of the eigenvalues of the 
purely diagonal ARCH model. We see that a) the largest eigenvalue is clearly shifted upwards by 
the off-diagonal elements; b) the structure of the top eigenvector is non-trivial, and has positive 
contributions at all lags (up to noise); c) the unconstrained estimations — both raw and ML - 
lead to 6 very small eigenvalues (perhaps even slightly negative) that all constrained models fail 
to reproduce. 

6 Note that the true likelihood is not necessarily larger than the realized one under a misspecified model. 
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Zumbach Long-Trend 
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Figure 7: Plot of the empirically determined kernels #2(^)5 9bb(^)j Qz(£) and <?ltW for the 
restricted models of Section [21 
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-1.31957 


-1.31956 
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Tabic 1: Log-likelihoods, according to Eq. (f3"TT) . In-sample and out-of-sample likelihoods are com- 
puted as follows: for each of N samp = 150 iterations, half of the stock names are randomly chosen 
for the calibration of K, L and the likelihood is computed with the obtained K* ,L* on each series 
of the same sample ('In-sample' likelihoods). Then, the likelihood is again computed with the same 
parameters but on the series of the other sample ('Out-of-sample' likelihoods). While the former 
quantify how much the estimated model succeeds in reproducing the given sample, the latter mea- 
sure the reliability of the model on other similar datasets. In order to quantify the validity of the 
model in an absolute way, the likelihood can be compared with the "true" value, obtained with 
simulated data (since an analytical treatment is out of reach). The average likelihood per point 
I (r t ) with r t simulated as a QARCH with parameters K*,L*, and v = 6.4 is equal to —1.34019, 
which is 1.5% away from the empirical valued The likelihoods reported in the table are averages 
over all samplings, and the corresponding 1-s.d. dispersion is found to be ~ 3 • 10~ 3 in all cases, to 
be compared to 2 • 10 -3 for random samplings of a fixed QARCH model with the same parameters. 
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Eigenvalues of K.ARCH 



Figure 8: Spectral decomposition of the feedback kernel K, for the raw, ML and ML+LT esti- 
mates. Left: The difference between the ranked eigenvalues of the estimated kernels and those 
of Kakck as a function of the latter (again ranked). The dashed oblique line has slope — 1 and 
separates positive eigenvalues from negative ones. Right: Structure of the first three and last 
three eigenvectors. Whatever the estimation method, the first eigenvector has a non trivial struc- 
ture, with mostly positive components, indicating a genuine departure from the diagonal ARCH 
benchmark, for which we would find a single peak at r = 1. 

The positiveness of all eigenvalues is not granted a priori, because nothing in the calibration 
procedure imposes the positivity of the matrix K*. Although we would naively expect that past 
excitation could only lead to an amplification of future volatility (i.e. that only strictly positive 
modes should appear in the feedback kernel), we observe that quasi-neutral modes do occur. 
This is clearly related to the negative streaks noted above at large lags, but we have no intuitive 
interpretation for this effect at this stage. 

5 Empirical study: stock index 

We complete our analysis by a study of the returns of the S&P-500 index in the QARCH framework. 
We use a long series of more than 60 years, from Oct. 1948 to Sept. 2011 (15 837 days). 

The computation of the correlation functions and the 'raw' calibration of a long ARCH(512) 
yields a s 2 (q) that can be fitted with the formula (|30| and the following parameters: w 0.20, a w 
1.28,g ~ 0.162 and qo f=s 262. Contrarily to single stocks, the One-step Maximum Likelihood 
estimation failed with q — 20, as the gradient and Hessian matrix evaluated at the arrival point 
are, respectively, large and not negative definite. Although the starting point appears to be close 
to a local maximum (the Hessian matrix is negative definite) , the one-step procedure does not lead 
to that maximum. 

In order to control better the Maximum Likelihood estimation, we lower the dimensionality 
of the parameter space and estimate a QARCH(IO), although still with a long memory diagonal 
(q = 50). Here the procedure turns out to be legitimate, and the resulting kernel K is depicted 
in Fig. IH1 (right). Interestingly, the off-diagonal content in the QARCH model for index returns is 
mostly not significant (again, white regions correspond to values not exceeding their theoretical 
uncertainty) apart from a handful of negative values around r = 8 — 10 and one strip at r = 5. 
The contribution of this strip to the QARCH feedback can be made explicit as 

2r t _ s ^#(5,T)r t _ T 

T<5 
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Figure 9: Heatmap of the q = 10 kernel for the index volatility. Left: the raw estimation; Right: 
the ML estimation with 'raw' prior (again, we have checked that the ARCH prior leads to very 
close results). 
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ARCH+ML 


IS 
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-1.16704 


-1.17079 



Table 2: Average log-likelihoods, according to Eq. (j3"Tj) . for the stock index. 



and describes a trend effect between the daily return last week r t _5 and the (weighted) cumulated 
return since then ^2 T<5 K(5,T)r t - T . It would be interesting to know whether this finding is 
supported by some intuitive feature of the trading activity on the S&P-500 index. Note that, 
again, none of the "simple" structures discussed in Section [3] is able to account for the structure 
of if*. 

The spectral analysis reveals a large eigenvalue much above the ARCH prediction, and a couple 
of very small eigenvalues, similarly to what was found for the stock data. However, the eigenvectors 
associated with them exhibit different patterns: the first eigenvector does not reveal the expected 
collective behavior, but rather a dominant r = 1 component, with a significant r = 2 component 
of opposite sign. The other modes do not show a clear signature and are hard to interpret. 

The procedure for computing in-sample/out-of-sample likelihoods is similar to the case of the 
stock data, but the definitions of the universes differs somewhat since we only have a single time 
series at our disposal. Instead of randomly selecting half of the series, we select half of the dates 
to define the in-sample universe f2, on which the correlation functions are computed and both 
estimation methods ('raw', and one-step maximum likelihood) are applied. Then we evaluate the 
likelihoods of the calibrated kernels, first on f2 to obtain the 'in-sample' likelihoods, and then on 
the complement of f2 to get the 'out-of-sample' likelihoods. We iterate iV samp = 150 times and 
draw a random subset of dates every time, then average the likelihoods, that we report in the 
Tabled The 1-s.d. systematic dispersion of the samplings is now ps 7 • 10 -3 . Because of the fewer 
number of observations in the index data compared to the stock data, corrections for the bias 
induced by the number of parameters M become relevant. Adjusting the out-of-sample likelihood 
by subtracting the bias -M/2n sa 3 ■ 10~ 3 (with n w T/2 = 7.5 • 10 3 and M = q(q - l)/2 = 45), 
brings the ARCH+ML result to a level competitive with ARCH (but not obviously better), and 
certainly better than the raw estimate. 
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6 Time reversal invariance 



As noticed in the introduction, QARCH models violate, by construction, time reversal invariance. 
Still, the correlation of the squared returns C^(t) is trivially invariant under time reversal, i.e. 
C^ 2 \t) = C^ 2 )(— t). However, the correlation of the true squared volatility with past squared 
returns C^ 2 \t) is in general not (see [551113 for a general discussion). A measure of TRI violations 
is therefore provided by the integrated difference A(r): 



r 

A(r) - ]T [C (2) (r') -C (2) (~r')] . (33) 



The empirical determination of C^ 2 \t) and A(r) for stock returns is shown in Fig. [TUJ Although 
less strong than for simulated data (see Fig. ITT]) . we indeed find a clear signal of TRI violation 
for stock returns, in agreement with a related study by Zumbach [37]. We compare in Fig. [TTJ 
the quantity A(r) obtained from a bona fide numerical simulation of the model, with previously 
estimated parameters. Note that any measurement noise on the volatility a 2 tends to reduce 
the TRI violations, but we have performed the numerical simulation in a way to reproduce this 
measurement noise as faithfully as possible. 

However, the alert reader should worry that the existence of asymmetric leverage correlations 
C(t > 0) 7^ between past returns and future volatilities is in itself a TRI-violating mechanism, 
which has nothing to do with the ARCH feedback mechanism. In order to ascertain that the 
effect we observe is not a spurious consequence of the leverage effect, we have also computed the 
contribution of C(t) to A(r), which reads to lowest order and schematically: 

T 

A(r) = L ( T ') [£(t' '— r) - £(t' + t)} + K contributions. (34) 

r'=l 

The first term on the right-hand side is plotted in the inset of Fig. [TTJ1 and is found to have a 
negative sign, and an amplitude much smaller than A(r) itself. It is therefore quite clear that 
the TRI- violation reported here is genuinely associated to the ARCH mechanism and not to the 
leverage effect, a conclusion that concurs with that of (47) . 

Still, the smallness of the empirical asymmetry compared with the simulation results suggests 
that the ARCH mechanism is "too deterministic". It indeed seems reasonable to think that the 
baseline volatility s 2 has no reason to be constant, but may contain an extra random contribution. 
Writing 

a 2 {t) = a 2 A (t) + u>(t); (u) = 0; (w(t)w(t - r)) = C u (t) = C u (-t) 

with u t a noise contribution and cta the ARCH volatilitjQ (i.e. deterministic when conditioned on 
past returns), then the asymmetry is found to be given by: 

A(r) = A A (r) - ^ E k ^') " T ") C ^ T ' + ■ 

t"=1 t' = 1 

If one assumes that the correlation function C u is positive and decays with time, the extra con- 
tribution to the asymmetry is negative, and reduces the observed TRI. This conclusion speaks 
in favor of a mixed approach to volatility modeling, bringing together elements of autoregressive 
QARCH models with those of stochastic volatility models. It would in fact be quite surprising that 
(although unobservable) the volatility should be a purely deterministic function of past returns. 
Although the behavioral interpretation of the above construction is not clear at this stage, the 
uncertainty on the baseline volatility level s 2 could come, for example, from true exogenous factors 
that mix in with the volatility feedback component described by the QARCH framework. 



7 For the sake of clarity we consider here a diagonal ARCH framework, but the argument is straightforwardly 
generalized for a complete QARCH. 
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Figure 10: Measure of time-reversal asymmetry fEq. 155)) for the stock data. Inset: The contribu- 

T 

tion to A(t) stemming from the leverage, i.e. the quantity L(t') [£( T ' — T ) — £( T ' + r )]- Note 

r'=l 

that this contribution is negative, and an order of magnitude smaller than A(t) itself. 




Figure 11: Measure of time- reversal asymmetry fEq.[55| for a simulated ARCH model with Student 
{y = 4) residuals on the 5 minute scale. The parameters of the simulation are the estimated 
kernel fc*(r) and L*(t) for stocks, with q — 20. At each date, 100 intraday prices are simulated 
(corresponding to the number of 5-minutes bins inside 8 hours) with the same of given by the 
QARCH model. The volatility is then computed using Rogers-Satchell's estimator, exactly as for 
empirical data. 
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7 Conclusion, extensions 



Wc have revisited the QARCH model, which postulates that the volatility today can be expressed as 
a general quadratic form of the past daily returns r t . The standard ARCH or GARCH framework is 
recovered when the quadratic kernel is diagonal, which means that only past squared daily returns 
feedback on today's volatility. This is a very restrictive a priori assumption, and the aim of the 
present study was to unveil the possible influence of other quadratic combinations of past returns, 
such as, for example, square weekly returns. We have defined and studied several sub-families of 
QARCH models that make intuitive sense, and with a restricted number of free parameters. 

The calibration of these models on US stock returns has revealed several features that we did 
not anticipate. First, although the off-diagonal (non ARCH) coefficients of the quadratic kernel are 
found to be highly significant both in-sample and out-of-sample, they are one order of magnitude 
smaller than the diagonal elements. This confirms that daily returns indeed play a special role 
in the volatility feedback mechanism, as postulated by ARCH models. Returns on different time 
scales can be thought of as a perturbation of the main ARCH effect. The second surprise is 
that the structure of the quadratic kernel does not abide with any of the simpler QARCH models 
proposed so-far in the literature. The fine structure of volatility feedback is much more subtle than 
anticipated. In particular, neither the model proposed in [10] (where returns over several horizons 
play a special role), nor the trend model of Zumbach in |48j are supported by the data. The third 
surprise is that some off-diagonal coefficients of the kernel are found to be negative for large lags, 
meaning that some quadratic combinations of past returns contribute negatively to the volatility. 
This also shows up in the spectral properties of the kernel, which is found to have very small 
eigenvalues, suggesting the existence of unexpected volatility-neutral patterns of past returns. 

As for the diagonal part of the quadratic kernel, our results are fully in line with all past studies: 
the influence of the past squared-return r\_ T on today's volatility er^ decays as a power-law jr° 
with the lag r, at least up to r ~ 2 months, with an exponent a close to unity (a 1.11), which 
is the critical value below which the volatility diverges and the model becomes non-stationary. As 
emphasized in (10) . markets seem to operate close to criticality (this was also noted in different 
contexts, see [12] [13j [3j [44] [211 [S3 for example). The smallness of a — 1 has several important 
consequences: first, this leads to long-memory in the volatility; second, the average square volatility 
is a factor 5 higher than the baseline volatility, in line with the excess volatility story [30] : most of 
the market volatility appears to be endogenous and comes from self-reflexive, feedback effects (see 
e.g. |43lllll[2Tj and references therein). Third, somewhat paradoxically, the long memory nature of 
the kernel leads to a small fluctuations of the volatility. This is due to a self-averaging mechanism 
occurring in the feedback sum, that kills fat tails. This means that, perhaps unexpectedly, the high 
kurtosis of the returns in ARCH models cannot be ascribed to volatility fluctuations but rather to 
leptokurtic residuals, also known as unexpected price jumps. 

Related to price jumps, we should add the following interesting remark that stresses the dif- 
ference between endogenous jumps and exogenous jumps within the ARCH framework. Several 
studies have revealed that the volatility relaxes after a jump as a power-law, akin to Omori's law 
for earthquake aftershocks: ~ (7qT~ 6 ', where t — is the time of the jump. The value of the 
exponent 9 seems to depend on the nature of the initial price jump. When the jump occurs because 
of an exogenous news, 9 ~ 1 [281 127] . whereas when the jump appears to be of endogenous origin, 
the value of 9 falls around 9 s» \ [351 [57]. In other words, as noted in [2JJ, the volatility seems to 
resume its normal course faster after a news than when the jump seems to come from nowhere. 
A similar difference in the response to exogenous and endogenous shocks was also reported in |41] 
for book sales. Now, if one simulates long histories of prices generated using an ARCH model with 
a diagonal kernel decaying as gr~ a , one can measure the exponent 9 by conditioning on a large 
price jump (which can only be endogenous, by definition!). One finds that 9 varies continuously 
with the amplitude of the initial jump, and saturates around 9 m | for large jumps (we have not 
found a way to show this analytically). A similar behavior is found within multifractal models as 
well |42j . If on the other hand an exogenous jump is artificially introduced in the time series by 
imposing a very large value of £t=o, one expects the volatility to follow the decay of the kernel 
and decay as 3t _q £q, leading to 9 = a s» 1. Therefore, the dichotomy between endogenous and 
exogenous shocks seem to be well reproduced within the ARCH framework. 

Finally, we have emphasized the fact that QARCH models are by construction backward look- 



21 



ing, and predict clear Time Reversal Invariance (TRI) violations for the volatility/square-return 
correlation function. Such violations are indeed observed empirically, although the magnitude of 
the effect is quite a lot smaller than predicted. This suggests that QARCH models, which postulate 
a deterministic relation between volatility and past returns, discard another important source of 
fluctuations. We postulate that "the" grand model should include both ARCH-likc feedback ef- 
fects and stochastic volatility effects, in such a way that TRI is only weakly broken. The stochastic 
volatility component could be the source of the extra kurtosis of the residuals noted aboveH 

The present paper is, to the best of our knowledge, the first attempt at unveiling the fine 
structure volatility feedback effects in autoregressive models. We believe that it is a step beyond 
the traditional econometric approach of postulating a convenient mathematical model, which is 
then brute-force calibrated on empirical data. What we really need is to identify the underlying 
mechanisms that would justify, at a deeper level, the use of a QARCH family of model rather 
than any another one, for example the multifractal random walk model. From this point of view, 
we find remarkable that the influence of daily returns is so strongly singled out by our empirical 
results, when we expected that other time scales would emerge as well. The quandary lies in the 
unexpectedly complex structure of the off-diagonal feedback component, for which we have no 
interpretation. 

A natural extension of our study that should shed further light on a possible behavioral inter- 
pretation of volatility feedback is to decompose daily returns into higher frequency components, 
for example overnight and intraday returns, or even 5 minute returns [TJ. Many other remain- 
ing questions should be addressed empirically, for example the dependence of the feedback effects 
on market capitalization, average volatility, etc. We have chosen to scale out the market-wide 
volatility, but other choices would be possible, such as a double regression on the past returns of 
stocks and of the index. Finally, other financial assets, such as currencies or commodities, should 
be studied as well. Stocks, however, offer the advantage that the data is much more abundant, 
specially if one chooses to invoke some structural universality, and to treat all stocks as different 
realizations of the same underlying process. 

Acknowledgements We thank R. Allez, P. Blanc, M. Potters and M. Virasoro for useful dis- 
cussions. 

A The exact spectrum of the BB model 

In this appendix, we give an account on the spectral properties of a continuous time kernel with 
the BB structure 

K(t', t") = fc(max(T', r")), e < r', r" < q. 

The eigenvalue equation 

J K(T,T')v(T')dT' = Xv(t) 

is differentiated to obtain a second order linear differential equation with appropriate boundary 
conditions: 

XV" (t) = fc'(r)F(r) 

XVl(q) =k(q)V x (q) (35) 
V(e) = 

where V(q) = J q v{r)dT. The resolution of this differential problem depends on the choice of fc(r), 
and we investigate below three particular cases: a linearly, exponentially and power-law decreasing 
kernel. 

For the problem in continuous time, s can be taken as 0, but in case this differential problem 
is seen as an approximation to a discrete time problem, it is important to keep e = 1. For the sake 
of simplicity of the solutions, we set e = in the following, but more precise numerical results for 
the first eigenmodes are obtained with e = 1 (see Fig. [12]), although for higher modes and at large 
q the choice of e is hardly relevant. 

8 This discussion might be related to the interesting observation made by Virasoro in [45]. 
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Linearly decreasing 

When k(r) — g ■ (1 — T~/q)t{ T<q -}, the general solution is a superposition of the two linearly 
independent solutions: 

™=Kvf9 and ^ c(r)=cos (vf^ 

The boundary condition V(0) = disqualifies V c , and V'(q) = selects only those values of A 
which satisfy 

gq J_ 

7r 2 (n-|) 5 



so that finally 



In fact, we see straight from Eq. (1331) that when fc' is constant, the problem for V amounts to that 
of a free quantum particle in a box with absorbing left wall and reflecting right wall. 



Exponentially decreasing 

When fe(r) = ge~ aT , the general solution is a superposition of the stretched Bessel functions 

Z (± 7 e— / 2 ) 

where 7 = 2 Bessel functions with negative argument are complex, so we keep only +7 > 0. 
The Bessel functions J v and Y„ of the first and second kind respectively, are linearly independent, 
but the lower boundary condition imposes the coefficients of the combination 

V(t) = Y (7) Jo ( 7 e- Qr/2 ) - Jo (l)Y (l^' 2 ) 

Using recursion formulas for Bessel functions, the upper boundary condition becomes 

Y (7n)J 2 ( 7 „e- a */ 2 ) - Jo( 7 „)F 2 ( 7 „e-^/ 2 ) 

and the eigenvalues \ n are quantized according to the corresponding zeros. 



Power-law decreasing 

Taking k(r) — gr~ a , the solutions are given in terms of rescaled Bessel functions Z v (Gradshteyn 
|231 8.491.12): 

V ± {t)=Az u (± 1 t 1 ^ (36) 

with 7 = rjj^i \p*Y, v = . Bessel functions with negative argument are complex, so we keep 
only +7 > 0. In the cases we consider, a > 1 (see Section [5]) so that v < 0. Although the 
Bessel function of the first kind J„ with negative non-integer v diverges at the origin (as x u ), V(t) 
vanishes linearly in + since V(r) — > \/t(t^) l ' = t. The first boundary condition is thus satisfied 
for Z v — J v . Using recursion formulas for Bessel functions, the boundary condition becomes 

Ju-2 (inq^'j = 

and the eigenvalues X n are quantized according to the corresponding zeros. 
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Figure 12: Three first eigenvectors of the BB Kernel (q — 200), from left to right. First row: linear 
decay (g = 0.1); Second row: exponential decay [g = 0.1, a = 0.15); Third row: power-law decay 
(g = 0.1, a = 1.15). The theoretical solutions (red line) are hardly distinguishable from the results 
of the numerical diagonalization (black dots). 
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